A Statistical Perspective on Algorithmic Leveraging
نویسندگان
چکیده
One popular method for dealing with large-scale data sets is sampling. For example, by using the empirical statistical leverage scores as an importance sampling distribution, the method of algorithmic leveraging samples and rescales rows/columns of data matrices to reduce the data size before performing computations on the subproblem. This method has been successful in improving computational efficiency of algorithms for matrix problems such as least-squares approximation, least absolute deviations approximation, and low-rank matrix approximation. Existing work has focused on algorithmic issues such as worst-case running times and numerical issues associated with providing high-quality implementations, but none of it addresses statistical aspects of this method. In this paper, we provide a simple yet effective framework to evaluate the statistical properties of algorithmic leveraging in the context of estimating parameters in a linear regression model with a fixed number of predictors. In particular, for several versions of leverage-based sampling, we derive results for the bias and variance, both conditional and unconditional on the observed data. We show that from the statistical perspective of bias and variance, neither leverage-based sampling nor uniform sampling dominates the other. This result is particularly striking, given the well-known result that, from the algorithmic perspective of worst-case analysis, leverage-based sampling provides uniformly superior worst-case algorithmic results, when compared with uniform sampling. Based on these theoretical results, we propose and analyze two new leveraging algorithms: one constructs a smaller least-squares problem with “shrinkage” leverage scores (SLEV), and the other solves a smaller and unweighted (or biased) least-squares problem (LEVUNW). A detailed empirical evaluation of existing leverage-based methods as well as these two new methods is carried out on both synthetic and real data sets. The empirical results indicate that our theory is a good predictor of practical performance of existing and new leverage-based algorithms and that the new algorithms achieve improved performance. For example, with the same computation reduction as in the original algorithmic leveraging approach, our proposed SLEV typically leads to improved biases and variances both c ©2015 Ping Ma, Michael W. Mahoney and Bin Yu.
منابع مشابه
Probabilistic Sufficiency and Algorithmic Sufficiency from the point of view of Information Theory
Given the importance of Markov chains in information theory, the definition of conditional probability for these random processes can also be defined in terms of mutual information. In this paper, the relationship between the concept of sufficiency and Markov chains from the perspective of information theory and the relationship between probabilistic sufficiency and algorithmic sufficien...
متن کاملThe Role of Algorithmic Applications in the Development of Architectural Forms (Case Study:Nine High-Rise Buildings)
The process of developing architectural forms has greatly been changed by advances in digital technology, especially in design tools and applications. In recent years, the advent of graphical scripting languages in the design process has profoundly affected 3D modeling. Scripting languages help develop algorithms and geometrical grammar of shapes based on their constituent parameters. This stud...
متن کاملStatistical and Algorithmic Perspectives on Randomized Sketching for Ordinary Least-Squares
We consider statistical and algorithmic aspects of solving large-scale least-squares (LS) problems using randomized sketching algorithms. Prior results show that, from an algorithmic perspective, when using sketching matrices constructed from random projections and leverage-score sampling, if the number of samples r much smaller than the original sample size n, then the worst-case (WC) error is...
متن کاملStatistical Signal Processing for Graphs by Nadya Travinin Bliss A Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Approved April 2015 by the Graduate Supervisory Committee: Manfred Laubichler, Co-Chair Carlos Castillo-Chavez, Co-Chair
Analysis of social networks has the potential to provide insights into wide range of applications. As datasets continue to grow, a key challenge is the lack of a widely applicable algorithmic framework for detection of statistically anomalous networks and network properties. Unlike traditional signal processing, where models of truth or empirical verification and background data exist and are o...
متن کاملAn Algorithmic Approach to Collective Behavior
The emergence of collective structure from the decentralized interaction of autonomous agents remains, with notable exceptions, a mystery. While powerful tools from dynamics and statistical mechanics have been brought to bear, sometimes with great success, an algorithmic perspective has been lacking. Viewing collective behavior through the lens of natural algorithms offers potential benefits. T...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014